The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English
نویسندگان
چکیده
Parallel corpora — original texts aligned with their translations — are a widely used resource in computational linguistics. Translation studies have shown that translated texts often differ systematically from comparable original texts. Translators tend to be faithful to structures of the original texts, resulting in a “shining through” of the original language preferences in the translated text. Translators also tend to make their translations most comprehensible with the effect that translated texts can be more explicit than their source texts. Motivated by the need to use a parallel resource for cross-linguistic feature induction in abstract anaphora resolution, this paper investigates properties of English and German texts in the Europarl corpus, taking into account both general features such as sentence length as well as task-dependent features such as the distribution of demonstrative noun phrases. The investigation is based on the entire Europarl corpus as well as on a small subset thereof, which has been manually annotated. The results indicate English translated texts are sufficiently “authentic” to be used as training data for anaphora resolution; results for German texts are less conclusive, though.
منابع مشابه
The DAD Parallel Corpora and their Uses
This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of these pronouns and their uses as abstract anaphora. Abstract anaphora have constructions such as verbal phrases, clauses and discourse segments as antecedents ...
متن کاملAcquisition of English anaphora by Iranian EFL learners
The present study examined the acquisition of anaphora in English by Iranian EFL learners as well as Persian speaking children. To do so, the study was conducted in three phases. In the first phase, 40 intermediate female and male EFL learners were selected from Puyan Institute in Takestan, Iran. Then, an off-line based Grammatical Judgment Task was administered. In the second phase, 40 female ...
متن کاملAbstract Anaphors in German and English
Anaphors in German and English Stefanie Dipper, Christine Rieger, Melanie Seiss, and Heike Zinsmeister 1 Ruhr-University Bochum, 44780 Bochum, Germany 2 University of Konstanz, 78457 Konstanz, Germany Abstract. Abstract anaphors refer to abstract referents such as facts or events. Automatic resolution of this kind of anaphora still poses a problem for language processing systems. The present pa...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملLexical Cohesion in English and Persian Abstracts
This study compares and contrasts lexical cohesion in English and Persian abstracts of Iranian medical students’ theses to appreciate textualization processes in the two languages. For this purpose, one hundred English and Persian abstracts were selected randomly and analyzed based on Seddigh and Yarmohamadi’s (1996) lexical cohesion framework, a version of Halliday and Hasan’s (1976) and Halli...
متن کامل